Model Selection

Zero-shot Classification

# Zero-shot Classification

Cropvision CLIP

A vision-language model fine-tuned based on the CLIP architecture, specifically designed for zero-shot classification of plant diseases

Image Classification English

Bge Reranker V2 M3 Q5 K M GGUF

This model is converted from BAAI/bge-reranker-v2-m3 into GGUF format using llama.cpp via ggml.ai's GGUF-my-repo space, primarily for text classification tasks.

Text Embedding Other

Marqo Fashionsiglip ST

Marqo-FashionSigLIP is a multimodal embedding model optimized for fashion product search, achieving a 57% improvement in MRR and recall rate compared to FashionCLIP.

Transformers English

Drama Large Xnli Anli

A zero-shot classification model fine-tuned on XNLI and ANLI datasets based on facebook/drama-large, supporting natural language inference tasks in 15 languages.

Large Language Model Supports Multiple Languages

Clip Backdoor Rn50 Cc3m Badnets

This is a pre-trained backdoor-injected model for studying backdoor sample detection in contrastive language-image pretraining.

Text-to-Image English

Gte Multilingual Base Xnli Anli

This model is a fine-tuned version of Alibaba-NLP/gte-multilingual-base on the XNLI and ANLI datasets, supporting multilingual natural language inference tasks.

Text Classification Supports Multiple Languages

Gte Multilingual Base Xnli

This model is a fine-tuned version of Alibaba-NLP/gte-multilingual-base on the XNLI dataset, supporting multilingual natural language inference tasks.

Text Classification Supports Multiple Languages

Clip Vit Base Patch32 Lego Brick

A CLIP-based fine-tuned model for LEGO brick image-text matching, specifically designed to recognize LEGO bricks and their descriptions.

Transformers English

ConceptCLIP is a large-scale vision-language pre-training model enhanced with medical concepts, suitable for various medical imaging modalities, capable of achieving robust performance across multiple medical imaging tasks.

Transformers English

Vit Large Patch14 Clip 224.laion2b

Vision Transformer model based on CLIP architecture, specialized in image feature extraction

Image Classification

Microsoft Git Base

GIT is a Transformer-based generative image-to-text model capable of converting visual content into textual descriptions.

Image-to-Text Supports Multiple Languages

Aimv2 Large Patch14 224 Lit

AIMv2 is a series of vision models pretrained with multimodal autoregressive objectives, demonstrating outstanding performance across multiple multimodal understanding benchmarks.

LLM2CLIP Llama 3 8B Instruct CC Finetuned

LLM2CLIP is an innovative approach that enhances CLIP's cross-modal capabilities through large language models, significantly improving the discriminative power of visual and text representations.

Multimodal Fusion

A multilingual vision-language pre-trained model for the remote sensing field, supporting image-text cross-modal tasks in 10 languages.

Image-to-Text Supports Multiple Languages

Marqo Fashionsiglip

A fine-tuned fashion multimodal retrieval model based on ViT-B-16-SigLIP, specializing in fashion product search

Text-to-Image English

Marqo Fashionclip

Marqo-FashionCLIP is a fashion-domain multimodal retrieval model based on the CLIP architecture, achieving state-of-the-art performance in fashion product search tasks through generalized contrastive learning.

Transformers English

A large-scale vision-language model based on Vision Transformer architecture, supporting cross-modal understanding between images and text

Vit L 16 HTxt Recap CLIP

A CLIP model trained on the Recap-DataComp-1B dataset using LLaMA-3 generated captions, suitable for zero-shot image classification tasks

Clip ViT B 32 Vision

ONNX ported version based on CLIP ViT-B/32 architecture, suitable for image classification and similarity search tasks.

Image Classification

Bert Base Japanese V3 Nli Jsnli Jnli Jsick

A Japanese natural language inference cross-encoder trained on tohoku-nlp/bert-base-japanese-v3, supporting entailment, neutral, and contradiction judgments

Text Classification Supports Multiple Languages

Clip Japanese Base

A Japanese CLIP model developed by LY Corporation, trained on approximately 1 billion web-collected image-text pairs, suitable for various vision tasks.

Transformers Japanese

line-corporation

BERTurk-Legal is a Transformer-based language model specifically designed for prior case retrieval tasks in the Turkish legal domain.

Large Language Model

Transformers Other

Bert Base Japanese V3 Nli Jsnli

A Japanese natural language inference model based on BERT architecture, trained on the JSNLI dataset, used to determine logical relationships (entailment/neutral/contradiction) between sentence pairs

Text Classification Supports Multiple Languages

Roberta Base Zeroshot V2.0 C

A zero-shot classification model based on the RoBERTa architecture, designed for text classification tasks without requiring training data, supports both GPU and CPU operation, and is trained using fully business-friendly data.

Text Classification

Transformers English

Deberta V3 Large Zeroshot V2.0 C

A DeBERTa-v3-large model specifically designed for efficient zero-shot classification, trained on fully commercially friendly synthetic data and NLI datasets, supporting GPU/CPU inference

Text Classification

Transformers English

Kf Deberta Base Cross Nli

A Korean natural language inference model based on the DeBERTa architecture, trained on the kor-nli and klue-nli datasets, supporting zero-shot classification tasks.

Text Classification

Transformers Korean

TeCoA is a vision-language model initialized from OpenAI CLIP, enhanced with supervised adversarial fine-tuning for improved robustness

CONCH is a vision-language foundation model for histopathology, pre-trained on 1.17 million pathology image-text pairs, demonstrating state-of-the-art performance in 14 computational pathology tasks.

Image-to-Text English

Japanese Clip Vit B 32 Roberta Base

A Japanese version of the CLIP model that maps Japanese text and images into the same embedding space, suitable for zero-shot image classification, text-image retrieval, and other tasks.

Transformers Japanese

Tinyclip ViT 39M 16 Text 19M YFCC15M

TinyCLIP is an innovative cross-modal distillation approach for large-scale language-image pre-trained models, achieving the optimal balance between speed and accuracy through affinity mimicking and weight inheritance techniques.

Fmops Distilbert Prompt Injection Onnx

This is the ONNX format conversion of the fmops/distilbert-prompt-injection model, designed for detecting prompt injection attacks.

Large Language Model

Transformers English

Roberta Base Nli

This model is a natural language inference model based on the RoBERTa architecture, specifically fine-tuned for depression detection tasks.

Text Classification

Transformers English

GIT is a Transformer-based image-to-text generation model capable of generating descriptive text from input images.

Transformers Supports Multiple Languages

Clip Vit Large Patch14

OpenAI's open-source CLIP model, based on Vision Transformer (ViT) architecture, supporting joint understanding of images and text.

A zero-shot image classification model based on OpenCLIP

Image Classification

mkaichristensen

Xlm Roberta Large Manifesto

A fine-tuned xlm-roberta-large model based on multilingual training data for zero-shot text classification, using the Manifesto Project coding scheme.

Text Classification

Transformers Other

This is a vision-language model based on the CLIP architecture, specifically post-trained on 80 million face images.

Multimodal Fusion

CLIP Giga Config Fixed

A large CLIP model trained on the LAION-2B dataset, using ViT-bigG-14 architecture, supporting cross-modal understanding between images and text

Pubmed Clip Vit Base Patch32

PubMedCLIP is a version of the CLIP model fine-tuned for the medical field, specifically designed to handle medical images and related text.

Text-to-Image English

flaviagiammarino

Git Base Finetune

GIT is a Transformer-based generative image-to-text model capable of converting visual content into descriptive text.

Transformers Supports Multiple Languages

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase